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Abstract 


This paper presents a new combined pointer and escape 
analysis algorithm for Java programs with unstructured mul- 
tithreading. The algorithm is based on the abstraction of 
parallel interaction graphs, which characterize the points-to 
and escape relationships between objects and the ordering 
relationships between actions performed by multiple par- 
allel threads. To our knowledge, this algorithm is the first 
interprocedural, flow-sensitive pointer analysis algorithm ca- 
pable of extracting the points-to relationships generated by 
the interactions between unstructured parallel threads. It 
is also the first algorithm capable of analyzing interactions 
between threads to extract precise escape information even 
for objects accessible to multiple threads. 

We have implemented our analysis in the IBM Jalapefio 
dynamic compiler for Java and used the analysis results 
to eliminate redundant synchronization. For our bench- 
mark programs, the thread interaction analysis significantly 
improves the effectiveness of the synchronization elimina- 
tion algorithm as compared with previously published tech- 
niques, which do not analyze these interactions. 


1 Introduction 


This paper presents a new, combined pointer and escape 
analysis algorithm for multithreaded programs. To our knowl- 
edge, this algorithm is the first interprocedural, flow-sensitive 
pointer analysis algorithm for programs with the unstruc- 
tured form of multithreading present in Java and similar 
languages. It is also the first algorithm capable of analyz- 
ing interactions between threads to extract precise escape 
information even for objects accessible to multiple threads. 


1.1 Analysis Overview 


The analysis is based on an abstraction we call parallel in- 
teraction graphs. The nodes in this graph represent objects; 
the edges between nodes represent references between ob- 
jects. For each node, the analysis also records information 
that characterizes how it escapes the current analysis region. 
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For example, an object escapes if it is reachable from an un- 
analyzed thread running in parallel with the current thread 
or returned to an unanalyzed region of the program. 

Combining points-to and escape information in the same 
analysis enables the algorithm to represent all potential in- 
teractions between the analyzed and unanalyzed regions of 
the program. The algorithm represents these interactions, 
in part, by distinguishing between two kinds of edges: in- 
side edges, which represent references created within the cur- 
rently analyzed region, and outside edges, which represent 
references created outside this region. Each outside edge 
represents a potential interaction in which the analyzed re- 
gion reads a reference created in an unanalyzed region. Each 
inside edge from escaped node represents a potential inter- 
action in which the analyzed region creates a reference that 
an unanalyzed region may read. 

Representing potential interactions with inside and out- 
side edges leads to an analysis that is compositional in two 
senses: 


e Method Compositionality: The algorithm analyzes 
each method once to derive a single parameterized 
analysis result that records all potential interactions of 
the method with its callers.’ At each call site, the al- 
gorithm matches outside edges from the callee against 
inside edges from the caller to compute the effect of 
the callee on the points-to and escape information of 
the caller. 


e Thread Compositionality: The algorithm analyzes 
each thread once to derive an analysis result that records 
all of the potential interactions of the thread with other 
parallel threads. The analysis can then combine analy- 
sis results from multiple parallel threads by matching 
each outside edge from each thread against all cor- 
responding inside edges from parallel threads. The 
result is a single parallel interaction graph that com- 
pletely characterizes the points-to and escape informa- 
tion generated by the combined parallel execution of 
the threads. Unlike previously published algorithms, 
which use an iterative fixed-point algorithm to com- 
pute the interactions [37, 16], the algorithm presented 
in this paper can compute the interactions between 
two parallel threads with a single pass over the paral- 
lel interaction graphs from the threads. 


Finally, the combination of points-to and escape infor- 
mation in the same analysis leads to an algorithm that is 


1Recursive methods require an iterative algorithm that may ana- 
lyze methods multiple times to reach a fixed point. 


designed to analyze arbitrary regions of complete or incom- 
plete programs, with the analysis result becoming more pre- 
cise as more of the program is analyzed. At every stage in 
the analysis, the current parallel interaction graph provides 
complete information about the points-to relationships for 
objects that do not escape the currently analyzed region 
of the program. The algorithm can therefore obtain useful 
analysis information without analyzing the entire program. 


1.2. Analysis Uses 


Parallel interaction graphs also record the actions that each 
thread performs and contain ordering information for these 
actions relative to the actions performed by other threads. 
Optimization and analysis algorithms can use this informa- 
tion to determine that actions from different threads can 
never execute concurrently and are therefore independent. 
Our compiler uses this ordering information to improve the 
precision of the thread interaction analysis. It also uses this 
information to implement a synchronization elimination op- 
timization — if all lock acquire and release actions on a 
given object are independent, they have no effect on the 
computation and can be removed. 

The analysis also provides information that is generally 
useful to compilers and program analysis tools for multi- 
threaded programs. Potential applications of our analysis 
include: sophisticated software engineering tools such as 
static race detectors and program slicers [28, 36]; memory 
system optimizations such as prefetching and moving com- 
putation to remote data; automatic batching of long latency 
file system operations; memory bank disambiguation in com- 
pilers for distributed memory machines [8]; memory module 
splitting in compilers that generate hardware directly from 
high-level languages [6]; lock coarsening [33, 22]; synchro- 
nization elimination and stack allocation[41, 10, 12, 15]; and 
to provide information required to apply traditional com- 
piler optimizations such as constant propagation, common 
subexpression elimination, register allocation, code motion 
and induction variable elimination to multithreaded pro- 
grams. 


1.3. Contributions 


This paper makes the following contributions: 


e Analysis Algorithm: It presents a new combined 
pointer and escape analysis algorithm for multithreaded 
programs. The algorithm is compositional at both the 
method and thread levels and is designed to deliver 
useful information without analyzing the entire pro- 
gram. 


e Analysis Uses: It shows how to use the action order- 
ing information present in parallel interaction graphs 
to perform a synchronization elimination optimization. 


e Experimental Results: It presents experimental re- 
sults from a prototype implementation of the algo- 
rithms. These results show that the algorithm can 
eliminate a significant number of synchronization op- 
erations. 


The remainder of the paper is organized as follows. Sec- 
tion 2 presents an example that illustrates how the analy- 
sis works. Sections 3 through 11 present the analysis algo- 
rithms. Section 14 presents experimental results, Section 15 
presents related work, and we conclude in Section 16. 


2 Example 


In this section we present an example that illustrates how 
the analysis works. Figure 1 presents the Java code for the 
example. The sum method in the Sum class computes the 
sum of the numbers from 0 to n, storing the result into a 
destination accumulator a. It computes this sum by creating 
a worker thread to compute the sum of the even numbers 
while it computes the sum of the odd numbers. When they 
finish, both threads add their contribution into the destina- 
tion accumulator. The sum method first constructs a work 
vector v of Integers for the worker thread to sum up, then 
initializes the worker object to point to the work vector and 
the destination accumulator. It starts the worker thread 
running by invoking its start method, which invokes the 
run method in a new thread running in parallel with the 
current thread. This mechanism of initializing a thread ob- 
ject to point to its conceptual parameters is the standard 
way for Java programs to provide threads with the informa- 
tion they need to initiate their computation. 


class Accumulator { 
int value = 0; 
synchronized void add(int v) { 
value += v; 
} 
} 
class Sum { 
public static void sum(int n, Accumulator a) { 
ane Vector v = new Vector(); 
for (int i =0; i<n; i += 2) { 
v.addElement (new Integer (i)) ; 
} 
2: Worker t = new Worker(); 
t.init(v,a); 
t.start(); 
int s = 0; 
for (int i =1; i<n; it= 2) { 
s=s+t+i; 
} 
a.add(s); 
} 
} 
class Worker extends Thread { 
Vector work; 
Accumulator dest; 
void init(Vector v, Accumulator a) { 


work = v; 
dest = a; 
} 
public void run() { 
3: Enumeration e = work.elements(); 


int s = 0; 
while (e.hasMoreElements()) { 
Integer i = (Integer) e.nextElement () ; 
s = s + i.intValue(); 
} 
4: dest.add(s) ; 


Figure 1: Sum Example 


We contrast the unstructured form of multithreading in 
this example with the structured, fork-join form of multi- 


points-to graph mapping from points-to graph before points-to graph after 
from init callee to caller call to init call to init 


Lb 


we) —O—" Om 


work 7 I i 


this —> {5 —7Q)<—t (2) <—t 


d—>i 3 ~ *0}<—a {0}<«—a 
. es aa Fe pr rrr . Lt 
— inside edge > outside edge Pan 
& inside node - outside node Mapping 
Figure 2: Callee-Caller Interaction Between init and sum 
actions points-to graph 
from run from run 
mapping between points-to graph actions threads ordering 
parallel threads from sum from sum from sum from sum 


vector \ Lt 
(3) is CP a 
“ LL ae Keyecrt) 2 (sync, 0)||2 
(syne, 0) 


Xp) : 


dest... eZ a 
ia} (0 :<«—a 


. Li tay oe ar . Lt 
— inside edge > outside edge ge 
@ inside node £3 outside node PPS 


Figure 3: Parallel Thread Interaction Between run and sum 


points-to graph actions threads ordering 


(sync, 1) 
ie ee (sync, 0)||2 
ee (sync, 0, 2) 
a (sync, 1, 2) 


Figure 4: Result of Parallel Thread Interaction 


threading found in, for example, the Cilk programming lan- 
guage [11]. Once the thread is the example is created, it exe- 
cutes independently of its parent thread and in parallel with 
the rest of its parent thread’s computation. Cilk threads, on 
the other hand, must join with their parent thread, complet- 
ing before their parent thread returns from the procedure in 
which they were spawned. 

We now illustrate the analysis by discussing its opera- 
tion on this example. We start with the init method. This 
method is passed the work vector and destination accumula- 
tor and initializes the worker to point to them. The analysis 
result for this method is a parallel interaction graph. Fig- 
ure 2 contains the points-to graph from the parallel interac- 
tion graph at the end of the init method. In general, our 
points-to graphs contain two kinds of nodes: inside nodes, 
which represent objects created during the computation of 
the method, and outside nodes, which represent objects cre- 
ated outside its computation. All of the nodes in the points- 
to graph for the init method represent the receiver or ob- 
jects passed into the method as parameters. These nodes are 
therefore outside nodes. Our points-to graphs also contain 
two kinds of edges: inside edges, which represent references 
created during the computation of the method, and out- 
side edges, which represent references created outside the 
computation. Because the init method reads no references 
created by other methods or threads, all of the edges in its 
points-to graph are inside edges. 


2.1 Interaction Between Caller and Callee 


Figure 2 also presents the points-to graph from the sum 
method just before the call to init. This graph contains 
one outside node (node 0), which represents the destination 
accumulator passed as a parameter to sum. It also contains 
two inside nodes — node 1, which represents the work vec- 
tor, and node 2, which represents the worker thread object. 
Each inside node corresponds to an object creation site and 
represents all objects created at that site. In the example, we 
label inside nodes with the line number of the corresponding 
object creation site in Figure 1. 

We next discuss how the analysis combines this points-to 
graph with the points-to graph from the init method to de- 
rive the points-to graph after the call to init. The algorithm 
uses the correspondence between the formal and actual pa- 
rameters to construct a mapping from the outside nodes of 
the init method to the nodes of the sum method. This 
mapping is then used to translate the inside edges from the 
init method into the points-to graph from the sum method. 
Figure 2 presents the result of this mapping, which yields 
the points-to graph after the call to init. 


2.2 Interaction Between Parallel Threads 


We next discuss how the analysis computes the interaction 
between the two threads in the example. Figure 3 presents 
the parallel interaction graph from the end of the worker 
thread’s run method. This method loads the work and dest 
references from the receiver object. Because these references 
were created outside the run method, the analysis uses an 
outside edge to represent each reference. Each of these out- 
side edges points to specific kind of outside node called a load 
node. In general, there is one load node for each statement 
in the program that loads a reference from an escaped ob- 
ject; that load node represents all of the objects to which the 
reference may point. In Figure 3, we have labeled the load 
nodes with the number of the corresponding load statement 
from Figure 1. 


In addition to the points-to information, the parallel in- 
teraction graph also records the synchronization actions that 
the method performs and the objects to which the actions 
are applied. In this case, the run method synchronizes on 
the work vector and the destination accumulator. The ac- 
tions (sync, 3) and (sync, 4) record these synchronizations. 
The synchronization on the work vector happens inside the 
enumeration’s nextElement method — Java library classes 
such as Vector often come with the synchronization required 
for correct execution in the face of concurrent access by par- 
allel threads. In this case, however, the synchronizations 
are unnecessary. Even though the work vector is accessed 
by multiple threads, the accesses are separated temporally 
by thread start events. Among other things, this example 
will show how the analysis detects the independence of the 
synchronization actions on the work vector. 

We next move to the parallel interaction graph at the end 
of the sum method. In addition to the points-to and action 
information, the graph records the threads that the method 
starts and ordering information between the method’s ac- 
tion and started threads. In this case, sum starts the worker 
thread, which is represented in the analysis by node 2. The 
ordering relation (sync, 0)||2 records the fact that a syn- 
chronization action on object 0 (the destination accumula- 
tor) may execute in parallel with the actions of the worker 
thread. Note that there is no parallel ordering relation be- 
tween the worker thread and sum’s synchronization actions 
on the work vector object. This absence indicates that all 
of these actions occur before the worker thread starts, and 
therefore do not execute in parallel with any of the thread’s 
actions. 

To model the parallel interaction, the analysis constructs 
a bidirectional mapping between the nodes of the parallel 
interaction graphs. Initially, the receiver object of the run 
method is mapped to the worker thread object from the sum 
method. The analysis then matches outside edges from one 
graph to corresponding inside edges from the other graph, 
using the match to extend the mapping from outside nodes 
to inside nodes. When the mapping is complete, it is used 
to combine the graph from the run method into the graph 
from the sum method. Figure 4 presents the combined graph. 
In addition to the points-to information, the combination 
algorithm also translates the synchronization actions from 
the worker thread into the new parallel interaction graph. It 
tags each action with the thread that performs the action. 
So, for example, the action (sync, 0,2) indicates that node 2 
(the worker thread’s node) may perform a synchronization 
action on an object represented by node 0. 

The run method also invokes the work.elements() method, 
which creates an enumerator object to enumerate through 
elements of the vector. The enumerator object is represented 
by an inside node. Note that because the enumeration ob- 
ject is captured in the run method, it cannot affect the com- 
putation outside this method. Therefore, the algorithm does 
not transfer its inside node into the combined graph. 


2.3. Information in the Combined Graph 


The compiler can extract the following information from the 
combined graph. First, the work vector node is captured in 
this graph. Even though it is accessed by multiple threads, 
it is not reachable from outside the total computation of 
the sum method. The combined graph therefore completely 
characterizes the points-to information and actions involv- 
ing the work vector object. Second, both the thread exe- 
cuting the sum method and the worker thread synchronize 


on the work vector object. But the ordering information 
indicates that none of these synchronizations can occur con- 
currently. The synchronization actions have no effect on the 
computation and can therefore be removed.” Finally, both 
thread synchronize on the destination accumulator object. 
But in this case, the synchronization actions from the sum 
thread may execute in parallel with the synchronization ac- 
tions from the worker thread. The compiler will not be able 
to remove the synchronization from the destination accumu- 
lator object. 


3. Analysis Abstractions 


In this section we formally present the basic abstractions 
that the analysis uses: the program and object represen- 
tations, points-to escape graphs, and parallel interaction 
graphs. 


3.1 Program Representation 


The algorithm represents the program using the following 
analysis objects. There is a set 1 € L of local variables and 
a set p € P of formal parameter variables. There is one for- 
mal parameter variable for each formal parameter of each 
method in the program. There is also a set cl € CL of class 
names. The analysis models static class variables using a 
one-of-a-kind node for each class; the fields of this node are 
the static class variables for the corresponding class. The 
analysis therefore treats the class name cl as a read-only 
variable that points to the corresponding one-of-a-kind node 
that contains the class’s static class variables. Together, the 
local, formal parameter, and class name variables make up 
the set v € V= LUPUCL of variables. There is also a set 
f € F of object fields and a set op € OP of methods. Ob- 
ject fields are accessed using syntax of the form v.f. Static 
class variables are accessed using syntax of the form cl.f. 
Each method has a receiver class cl and a formal parameter 
list pp,---,P,- We adopt the convention that parameter py 
points to the receiver object of the method. 

The algorithm represents the computation of each method 
using a control flow graph. The nodes of these graphs are 
statements st € ST. We assume the program has been pre- 
processed so that all statements relevant to the analysis are 
in one of the following forms: 


e A copy statement 1 = v. 

e A load statement 1; = 12.f. 

e A store statement 11.f = 1e. 

e A monitor acquire statement acquire(1). 


e A monitor release statement release(1). 


A return statement return 1, which identifies the re- 
turn value 1 of the method. 


e An object creation site of the form 1 = new cl. 


A method invocation site m € M of the form 
l= 1o.op(1i, ny 1). 


e A thread start site of the form 1.start(). 


2To satisfy the Java memory model, the compiler may have to 
leave memory barriers behind at the old synchronization points. 


The analysis represents the control flow relationships be- 
tween statements as follows: pred(st) is the set of state- 
ments that may execute immediately before st, and succ(st) 
is the set of statements that may execute immediately after 
st. There are two program points for each statement st, 
the program point est immediately before st executes, and 
the program point ste immediately after st executes. The 
control flow graph for each method op starts with an enter 
statement enterop and ends with an exit statement exitop. 

The interprocedural analysis uses call graph information 
to compute sets of methods that may be invoked at method 
invocation sites. For each method invocation site m € M, 
callees(m) is the set of methods that m may invoke. Given a 
method op, callers(op) is the set of method invocation sites 
that may invoke op. The current implementation obtains 
this call graph information using a variant of class hierarchy 
analysis [17], but the algorithm can use any conservative 
approximation to the actual call graph generated when the 
program runs. 


3.2 Object Representation 


The analysis represents the objects that the program ma- 
nipulates using a set n € N of nodes. There are several 
kinds of nodes: 


e There is a set Ny of inside nodes. Inside nodes rep- 
resent inside objects, which are objects created within 
the current analysis scope and accessed via references 
created within the current analysis scope. This set 
consists of two subsets: 


— Nodes in Ny represent objects created by the cur- 
rent thread. There is one node in Ny for each ob- 
ject creation site; that node represents all objects 
created at that site by the current thread. 


— Nodes in Ny represent objects created by threads 
running in parallel with the current thread. There 
is one node in N; for each object creation site; 
that node represents all objects created at that 
site by threads running in parallel with the cur- 
rent thread. 


Two nodes ni € Nz and nz € Ny are corresponding 
nodes if they represent objects created at the same 
object creation site. 


In Java, each thread corresponds to an object that im- 
plements the Runnable interface. The set Nr C Ny 
of runnable nodes represents runnable objects. Nr C 
Nz represents runnable objects created by the current 


thread, and Nr C Ny represents runnable objects cre- 
ated by threads running in parallel with the current 
thread. Nr = NrvU Nr. 


e There is aset No of outside nodes. Outside nodes rep- 
resent outside objects, which are objects created out- 
side the current analysis scope or accessed via refer- 
ences created outside the current analysis scope. This 
set consists of several subsets: 


— There is a set Nz of load nodes. When a load 
statement executes, it loads a value from a field 
in an object. If the loaded value is a reference, the 
analysis must represent the object that the refer- 
ence points to. Each load node represents outside 
objects whose references are loaded by the corre- 
sponding load statement. There are two kinds of 
load nodes: 


* Nz contains one node for each load statement 
in the program. That node represents outside 
objects whose references are loaded at that 
statement by the current thread. 


* Nr contains one node for each load statement 
in the program. That node represents outside 
objects whose references are loaded at that 
statement by threads running in parallel with 
the current thread. 


Two nodes m1 € Nz and nz € Nz are correspond- 
ing nodes if they represent outside objects whose 
references are loaded at the same load statement. 


— There is a set of return nodes Nr. When the al- 
gorithm skips the analysis of a method invocation 
site, it uses a return node to represent the return 
value of the method invoked at that site. There 
are two kinds of return nodes: 


* Nr contains one node for each skipped method 
invocation site in the program. That node 
represents objects returned by methods in- 
voked at that site by the current thread. 


* Np contains one node for each skipped method 
invocation site in the program. That node 
represents objects returned by methods in- 
voked at that site by threads running in par- 
allel with the current thread. 


Two nodes m1 € Nr and nz € ‘Np are correspond- 
ing nodes if they represent objects returned at the 
same skipped method invocation site. 


— Np: There is one parameter node n € Np for each 
formal parameter in the program. Each param- 
eter node represents the object that its parame- 
ter points to during the execution of the analyzed 
method. The receiver object is treated as the first 
parameter of each method. Given a parameter p, 
the corresponding parameter node is np. There 
is always an inside edge from p to np. 


— Neo: There is one class node n € Nc for each class 
in the program. The fields of this node represent 
the static class variables of its class. Given a class 
cl, the corresponding class node is n,1. There is 
always an inside edge from cl to ng. 


The set N = N;UN,UNp, and the set N = Nj UNLUNR. 
Given a node n € N, 7% represents the corresponding node 
in N. Given a node n € N, n represents the corresponding 
node in N. 

The analysis represents each array with a single node. 
This node has a field elements, which represents all of the 
elements of the array. Because the points-to information for 
all of the array elements is merged into this field, the analysis 
does not make a distinction between different elements of the 
same array. 


3.3. Points-To Escape Graphs 


A points-to escape graph is a quadruple of the form (O, I,e,r), 
where 


e OC(N xF) x Nz is a set of outside edges. Outside 
edges represent references created outside the current 
analysis scope, either by the caller, by a thread run- 
ning in parallel with the current thread, or by an un- 
analyzed invoked method. 


e IC ((NxF) x N)U(V x N) is aset of inside edges. In- 
side edges represent references created inside the cur- 
rent analysis scope. 


ee: N - 2NPUNCUNTUM ig an escape function that 
records the escape information for each node. A node 
escapes if it is reachable from a parameter node np € 
Np, a static class variable represented by a field of a 
class node nc € Neo, a thread node nr € Nr run- 
ning in parallel with the current thread, or an object 
passed as a parameter to or returned from an unana- 
lyzed method invocation site. 


er CN is a return set that represents the set of ob- 
jects that may be returned by the currently analyzed 
method. All nodes in the return set escape to the caller 
of the analyzed method. 


Both O and I are graphs with edges labeled with a field 
from F’. We define the following operations on nodes of the 
graphs: 


edgesTo(I,n) = {(v,n) € I}U {((n', £),n) € I} 
edgesFrom(I,v) = {(v,n) € I} 
edgesFrom(I,n) = {((n,£),n’) € I} 
edges(I,v) = edgesFrom(J, v) 
edges(I,n) = edgesTo(I,n) U edgesFrom(I,n) 
I(v) = {n.(v,n) € Th 
I(n,£) = {n'.((n,£),n') € I} 


For each node n, the escape function e(n) and the return 
set r together record all of the different ways the node (and 
the objects that it represents) may escape from the current 
analysis scope. Here are the possibilities: 


e If a parameter node np € e(n), then n represents an 
object that may be reachable from p. 


e If a class node nj € e(n), then n represents an ob- 
ject that may be reachable from one of the static class 
variables of the class cl. 


e Ifa thread node nr € e(n), then n represents an object 
that may be reachable from a runnable object repre- 
sented by nr. 


e Ifa method invocation site m € e(n), then n represents 
an object that may be reachable from the parameters 
or the return value of an unanalyzed method invoked 
at m. 


e If n € r, then n represents an object that may be 
returned to the caller of the analyzed method. 


The escape information must satisfy the escape informa- 
tion propagation invariant that if n1 points to ne, then ne 
escapes in at least all of the ways that ni escapes. We for- 
malize this invariant with the following inference rule, which 
states that if there is an edge from ni to nz, e(n1) C e(n2). 
When the analysis adds an edge to the points-to escape 
graph, it may need to update the escape information. 


((n1,£), n2) E Ou! 
e(mi) € e(n2) 


We say that a node n; violates the propagation con- 
straint if there is an edge from ni to nz and e(n1) Z e(n2). 
During the analysis of a method, the algorithm may add 
edges to the points-to escape graph. These edges may make 
the updated nodes (the nodes that the new edges point from) 


temporarily violate the propagation constraint. Whenever it 


adds a new edge, the analysis uses the propagate((O, I, e,r), S) 


algorithm in Figure 5 to propagate escape information from 
the nodes in S to restore the invariant and produce a new 
escape function e’ that satisfies the propagation constraint. 
This algorithm takes a points-to escape graph and a set S 
of nodes that may violate the constraint, then uses a work- 
list approach to propagate the escape information from the 
updated nodes to the other nodes in the graph. 


propagate((O, I,e,r), S) 
Initialize worklist and new escape function 
e =e 
W=S 
while (W 4 0) do 
Remove a node from worklist 
W=W —-{n} 
Propagate escape information to all nodes 
that ni points to 
for all ((n1,£),n2) € OUT do 
Restore constraint for no 
e'(n2) = e' (nz) Ue’ (m1) 
if e’(n2) changed then 
WwW — Ww U {n2} 
return(e’) 


Figure 5: Escape Information Propagation Algorithm 


Given our abstraction of points-to escape graphs, we can 
define the concepts of escaped and captured nodes as follows: 


e escaped((O,I,e,r),n) if e(n) 40 or ner, and 
e captured((O,I,e,r),n) if e(n) =O and n ¢r. 


4 Actions 


The algorithm is designed to record various actions that the 
program performs on objects. Each action consists of an 
action label that identifies the kind of action performed, a 
node that represents the object on which the action was 
performed, and an optional thread node that represents the 
thread that performed the action. For the purposes of this 
paper, the set of action labels is b € B = {1d,sync}. The 
set of actions is a € A = (B x N)U(Bx N x Nr). Here is 
the meaning of the actions: 


e (sync,n) records a synchronization action (either a 
monitor acquire or release) performed by the current 
thread on an object represented by n. 


e (sync, n,nr) records a synchronization action performed 


by a thread represented by nr. 


e (1d,n) records a load by the current thread on an es- 
caped node. The result of the load is a reference to an 
object represented by the load node n. 


e (1d,n,nr) records a load performed by the thread nr. 


It is straightforward to augment the set of action labels and 
the analysis to record arbitrary actions such as reading and 
writing objects or invoking a given method on an object. 
It is also straightforward to generalize the set of actions to 
include actions performed on multiple objects. 


5 Parallel Interaction Graphs 


The algorithm uses a dataflow analysis to generate, at each 
program point in the method, a parallel interaction graph 
(G, T, a, TT) * 


e G is a points-to escape graph that summarizes the 
points-to and escape information for the current thread. 


e The parallel thread map 7 : Nr — {0,1,2} counts the 
number of instances of each thread that may execute 
in parallel with the current thread at the current pro- 
gram point. If r(n) = 1, then at most one instance of 
nm may execute in parallel with the current thread at 
the current program point; if 7(n) = 2, then multiple 
instances of n may execute in parallel with the current 
thread at the current program point. We define the 
following operations: 


—x@®y=min(2,r+y) 


Bee ee ife >2 
tOY=% max(0,2—y) otherwise 


e The action set a C A records the set of actions exe- 
cuted by the analyzed computation. 


e The parallel action relation  C A x Nr records or- 
dering information between the actions of the current 
thread and threads that execute in parallel with the 
current thread. Specifically, (a,nr) € 7 if a may have 
happened after at least one of the thread objects rep- 
resented by nr started executing. In this case, the 
actions of a thread object represented by nr may af- 
fect a. 


remove((G, 7, a, 7), S) denotes the parallel interaction graph 
obtained by removing all of the nodes in S from (G,7, a, 7). 

The analysis uses parallel interaction graphs to compute 
the interactions between parallel threads. During the anal- 
ysis, one of the threads is the current thread; conceptually, 
nodes from the other threads move into the context of the 
current thread. In the analyis context of the current thread, 
all of the nodes from the other threads come from outside 
the current thread. The analysis models this by replacing 
each node from an other thread with its corresponding ver- 
sion from outside the current thread. Given a parallel inter- 
action graph ((O,/,e,r),7,a,7), ((O,1,€,7),7,@, 7) is the 
parallel interaction graph with nodes replaced with the cor- 
responding nodes from outside the current thread, defined 
as follows: 


n ifneN 
n otherwise 


ele(n)) U a(e(n)) ifn € N 
€(n) = 0 ifne N 
a(e(n)) otherwise 
FT = {p(n).n er} 7“ 
aay = tT(n)@rT(n) ifneN 
0 otherwise 
— J (b,p(n)) if a = (b,n) 
HAC) = 1 (b,u(n),w(nr)) if a = (b,n,n7) 
@ = {pa(a).a € a} 
7 = {(ua(a), w(n)).(a,n) € mr} 


The following operation removes a set of nodes S from a 
parallel interaction graph ((O,I,e,7r),7,@, 7). 


(O'," I, e, r'), Tr, a’, n’) —J remove(((O, I, e, r), T, Qa, TT), S) 
where 


S’ = (N-S) 

O' = ON((S' x F) x S’) 

Tl =IN((S' x F)x 8S") 
e'(n) = e(n)N(S’UM) 


r = rns! 
' _ J rn) ifne Ss’ 
a oe 0 otherwise 


a’ = aN ((Bx S’)U(Bx S’ x S’)) 
mw = tN(((Bx S')U(Bx S’ x S’)) x S’) 


6 Intraprocedural Analysis 


The analysis of each method op starts with the initial par- 
allel interaction graph ((Oo, Io, €0,70), 70, @o, 70), defined as 
follows: 


e In Jo, each formal parameter points to its correspond- 
ing parameter node and each class points to its corre- 
sponding class node. 


Io = {(p, np)-p € P} U {{cl, m¢1).c1 € CL} 


e The initial set of outside edges is empty: Oo = 0 


e The initial escape function ep is set up so that each 
parameter or class node is marked as escaping via it- 


self. 
a {n} ifné€ NpUNo 
eo(n) = { 0 otherwise 


e The initial return set and action set are empty: ro = 9, 
and ao = 0. 


e Initially there are no threads running in parallel with 
the current thread. Vnr € Nr.to(nr) = 0. 


e The initial parallel action relation is empty: mo = 0. 


The algorithm analyzes the method under the assump- 
tion that the parameters and static class variables all point 
to different objects. If the method may be invoked in a call- 
ing context in which some of these pointers point to the same 
object, this object will be represented by multiple nodes dur- 
ing the analysis of the method. In this case, the analysis 
described below in Section 9 will merge the corresponding 
outside objects when it combines the final analysis result for 
the method into the calling context at the method invoca- 
tion site. Because the combination algorithm retains all of 
the edges from the merged objects, it conservatively models 
the actual effect of the method. 

The intraprocedural analysis is a dataflow analysis that 
propagates parallel interaction graphs through the state- 
ments of the method’s control flow graph. The transfer func- 
tion ((O',I',e',r'),7', a’, n’) = [st] (((O, I, e,7), 7,0, 7)) de- 
fines the effect of each statement st on the current parallel 
interaction graph. Most of the statements first kill a set 
of inside edges, then generate additional inside and outside 
edges. Figure 6 graphically presents the rules that deter- 
mine the sets of generated edges for the different kinds of 
statements. Each row in this figure contains four items: a 


statement, a graphical representation of existing edges, a 
graphical representation of the existing edges plus the new 
edges that the statement generates, and a set of side condi- 
tions. The interpretation of each row is that whenever the 
points-to escape graph contains the existing edges and the 
side conditions are satisfied, the transfer function for the 
statement generates the new edges. We would like to point 
out several aspects of the intraprocedural analysis: 


e start Statements: At each start statement of the 
form 1.start(), 1 may point to several thread nodes. 
The analysis adds all of these nodes to the new parallel 
thread map. 


e Synchronization: The transfer function for synchro- 
nization statements adds a synchronization action to 
the new parallel interaction graph. This synchroniza- 
tion action is recorded as executing in parallel with all 
of the thread nodes in the current parallel thread map. 
This ordering information is used later in the analysis 
to help determine if synchronization actions on a given 
node are independent. 


e Outside Edges: A load statement may add an out- 
side edge to the current parallel interaction graph. In 
this case, the transfer function also records the fact 
that the outside edge is created in parallel with all 
of the thread nodes in the parallel thread map. This 
ordering information is used during the thread interac- 
tion algorithm to ensure that these outside edges are 
not matched with inside edges from threads whose ex- 
ecution starts after the execution of the load statement 
that generated the outside edge. 


We next present the dataflow analysis framework from 
the intraprocedural analysis. This framework includes the 
transfer functions for the basic statements and the definition 
of the confluence operator at merge points in the control-flow 
graph. 


6.1 Copy Statements 


A copy statement of the form 1 = v makes 1 point to the 
object that v points to. The transfer function updates I to 
reflect this change by killing the current set of edges from 1, 
then generating additional inside edges from 1 to all of the 
nodes that v points to. 


Killy = edges(J,1) 
Gen; = {1} x I(v) 
I’ = (I—Killr) U Genz 


6.2 Load Statements 


A load statement of the form 1; = 12.f makes 1; point to the 
object that l2.f points to. The analysis models this change 
by constructing a set S of nodes that represent all of the 
objects to which l2.f may point, then generating additional 
inside edges from 1; to every node in this set. 

All nodes accessible via inside edges from 12.f should 
clearly be in S. But if 12 points to an escaped node, other 
parts of the program such as the caller or threads executing 
in parallel with the current thread can access the referenced 
object and store values in its fields. In particular, the value 
in l2g.f may have been written by the caller or a thread 
running in parallel with the current thread — in other words, 
lo.f may contain a reference created outside of the current 
analysis scope. The analysis uses an outside edge to model 


Statement 


1, =1o.f 


1, =1oe.f 


1,.f =1e2 


1=new cl 


Existing 
Edges 


li 


lo——_ > - 


——-> existing inside edge 


ore » existing outside edge 


Generated 
Edges 


— generated inside edge 


eee >» generated outside edge 


Figure 6: Generated Edges for Basic Statements 


Side 
Conditions 


@ is the load node 
for 11 = lo.f 


Q) escaped 


8) is the inside node 
for 1 = new cl 


O inside node or 
outside node 


this reference. The outside edge points to the load node for 
the load statement, which is the outside node that represents 
the objects that the reference may point to. 

The analysis must therefore consider two cases: the case 
when 12 does not point to an escaped node, and the case 
when 12 does point to an escaped node. The algorithm 
determines which case applies by computing Sz, the set of 
escaped nodes to which ly points. S7 is the set of nodes 
accessible via inside edges from 12.f. 


Se = {ne € I(12).escaped((O, I, e,r),n2)} 
Sr = Uf{I(n2, £).n2 € I(12)} 


If Sz = @ (ie., 12 does not point to an escaped node), 
S = S7 and the transfer function simply kills all edges from 
11, then generates inside edges from 1; to all of the nodes 
in S. 


Kall; = edges(I, 11) 
Gen; = {11} x S$ 
= (I — Killr) U Genz 


If Sz # @ (i.e., 12 points to at least one escaped node), 
S = S;U {nz}, where nz is the load node for the load 
statement. In addition to killing all edges from 11, then 
generating inside edges from 1; to all of the nodes in S, the 
transfer function also generates outside edges from the es- 
caped nodes to nz and propagates the escape information 
from the escaped nodes through nz. It also generates a 
load action (1d,n) and updates the parallel action relation 
to record the fact that the action may execute in parallel 
with thread objects represented by the current set of paral- 
lel thread nodes. In this case, the new outside edges may 
represent references created by these parallel thread objects. 


Kall; — edges(I, 11) 
Gen; = {li} x S 
I’ = (I —Kill;) Genz 
Geno = (Sz x {f}) x {nz} 
O' = OUGeno 
e’ = propagate((O’,I',e,r), Sz) 
a’ = aU {(ld,nz)} 
w = wU({(1ld,nz)} x {nr.t(nmr) > 0}) 


6.3 Store Statements 


A store statement of the form 11.£ = 12 finds the object 
to which 1; points, then makes the f field of this object 
point to same object as lz. The analysis models the effect 
of this assignment by finding the set of nodes that 1; points 
to, then generating inside edges from all of these nodes to 
the nodes that 12 points to. It also propagates the escape 
information from all of the nodes that 1; points to through 
the nodes that 12 points to. 


Gear = tay) ke RIOD 
I’ = TUGen; 
e’ = propagate((O’, I’, e,r), 1(11)) 


6.4 Acquire and Release Statements 


An acquire statement of the form acquire(1) finds the ob- 
ject to which 1 points, then acquires that object’s lock. A 
release statement of the form release(1) finds the object to 
which 1 points, then releases that object’s lock. The analy- 
sis models the effect of these statements by finding the set 
of nodes that 1 points to, then recording synchronization 
actions on all of these nodes. 

a’ = aU ({sync} x I(1)) 

nw = wU(({sync} x I(1)) x {nr.7(nr) > 0}) 


6.5 Object Creation Sites 


An object creation site of the form 1 = new cl allocates a 
new object and makes 1 point to the object. The analysis 
represents all objects allocated at a specific creation site 
with the creation site’s inside node n. The transfer function 
models the effect of the statement by killing all edges from 
1, then generating an inside edge from 1 to n. 


Kill; = edges(J,1) 
Genz = {(1,n)} 
I’ = (I—Kill;) UGen; 


6.6 Return Statements 


A return statement return 1 specifies the return value for 
the method. The immediate successor of each return state- 
ment is the exit statement of the method. The analysis 
models the effect of the return statement by updating r to 
include all of the nodes that 1 points to. 


r’ = 1(1) 


6.7 Control-Flow Join Points 


To analyze a statement, the algorithm first computes the 
join of the parallel interaction graphs flowing into the state- 
ment from all of its predecessors. It then applies the transfer 
function to obtain a new parallel interaction graph at the 
point after the statement. The join operation L is defined 
as follows. 


((O, I, e, Yr), 7; Qa, Tt) == 
((O1, 11, €1,71), 71,01, 71) U ((Od, Tn, e2, 72), T2, 2, 72) 


where O = O,UOg, I = Un, Vn € N.e(n) = e1(n)Ue2(n), 
r=71Ure, Vn € N.t(n) = max(T1(n), 72(n)), @ = a1 Vas, 
and 7 = 71 Uma. 

The corresponding partial order LC is 


((O1, 1, e1,71), 71, 1,771) E ((Oo, Io, e2, '2), T2, 2, 772) 


if O1 C Oo, h C In, Vn € Niei(n) C ea(n), mm C ra, 
Vn € N.ti(n) < 72(n), a1 C ae, and m C 7m. Bottom is 
((0,0,e1,0),71,0,0), where Vn € N.e.(n) = @ and Vn € 
N.r1(n) =.0s 


6.8 Analysis Results 


The analysis of each method produces analysis results G(est) 
and 3(ste) before and after each statement st in the method’s 
control flow graph. The analysis result (@ satisfies the fol- 
lowing equations: 


B(eenterop) = ((Oo, Io, €0, 70), To, Ho, To) 
GBlest) = LI{B(st’e).st’ € pred(st)} 
A(ste) = [st](G(est)) 


The final analysis result of method op is the analysis result 
at the program point after the exit node, i.e., G(exitope). 
As described below in Section 9.4, the analysis solves these 
equations using a standard worklist algorithm. 


7 Matching Inside and Outside Edges 


In the interprocedural analysis, outside edges in the callee’s 
parallel interaction graph represent inside edges in the caller’s 
parallel interaction graph. To compute the effect of a method 


call, the analysis matches the callee’s outside edges against 
the corresponding inside edges from the caller. In the inter- 
thread analysis, outside edges in the parallel interaction 
graphs of each thread represent inside edges in the parallel 
interaction graph of the other thread. To compute the inter- 
actions, the analysis matches outside edges from each thread 
against inside edges from the other thread. The matching 
process is conceptually similar in both cases. This section 
discusses the matching algorithm we use for both the inter- 
procedural and the inter-thread analyses. 

The matching algorithm takes two points-to escape graphs 
(Oi, Ti, ei, ri) (¢ € {1, 2}) and two initial mappings py; : N > 
N. It produces two new mappings ps, : N > N that extend 
the initial mappings to map the outside nodes in each graph 
to the corresponding nodes in the other graph. 


(uo) = match((O1, l,e1,11), (O2, In, €2, 12), pi, M2) 


We formulate the mapping using set inclusion constraints [1]. 


This formulation enables us to present a compact, simple 
specification of the mapping result using a set of constraint 
rules. Figure 7 presents the constraints that the matching 
algorithm_must satisfy. Note that these contraints use the 
notation 7 to represent the complement of i; ic. I = 2, 
2=1. The constraints basically specify that if an outside 
edge from one graph matches an inside edge from the other 
graph, then the mappings must map the outside edge’s node 
to the inside edge’s node. This node mapping potentially 
enables more edge matchings. 

Figure 9 presents the algorithm that solves these con- 
straints. It operates by repeatedly finding a node in one 
graph that is already mapped to a node in the other graph. 
It then checks if there is an inside edge from one of these 
nodes that it can match up with a corresponding outside 
edge from the other node. If so, it updates one of the map- 
pings to reflect the fact that the node that the outside edge 
points to is mapped to the node that the inside edge points 
to. 


match((O1, hh, el, T1), (Oo, Tp, e2, T2), 1, [2) 
Initialize worklists and results 


for i= 1,2 do 
Mi = Mi 
Wi = {(n1,n3).n3 € pi(ni)} 
D,=9 


while choose (n1,3) € W; do 
Remove a pair from worklist 
Wi = Wi — {(ni, ns) } 
D; = Di U {(n1,13)} 
Check outside edges for rule 2 
for all ((ni,£),n2) € O; do 
for all ((nz,f),n4) € I> do 
Hi; (n2) = wi; (n2) U {na} 
if (n2, na) ¢ D; then 
Wi = Wi U {(n2, na) } 
Check inside edges for rule 3 
for all ((ni,£),n2) € I; do 
for all ((ng3,£),n4) € Oz do 
(ma) = pe(na) U {no} 
if (na, n2) ¢ D- then 
W-=W-U (n4,n2) 
return (}14,, >) 


Figure 9: Algorithm for Matching Inside and Outside Nodes 


8 Combining Points-To Escape Graphs 


Once the matching algorithm has set up the mapping be- 
tween inside and outside nodes in the two graphs, the combi- 
nation algorithm uses the mapping to generate a new points- 
to escape graph that reflects the final points-to and escape 
relationships generated by the interaction. 

The combination algorithm takes two points-to escape 
graphs (O;,1i,e:,ri) (¢ € {1,2}) and two initial mappings 
pi: N > N. It produces the final points-to escape graph 
(O',I',e',r’). The combination algorithm performs two ba- 
sic tasks: it traces out reachable edges and nodes from the 
two input points-to escape graphs so that they are present 
in the final graph, and it uses the mappings between out- 
side and inside nodes to translate inside edges into the final 
graph. 


((O'",T',e', 7"), wi, oe) = 
combine((O1, Ii, e1, 71), (O2, I2, e2, 72), 1, H2) 


As for the mapping algorithm, we specify the combina- 
tion result using a system of set inclusion constraints. Fig- 
ure 8 presents the constraints that specify the result of the 
combination algorithm. These constraints extend the initial 
mappings to two new mappings yp, : N 4 N. These new 
mappings have the property that n € p(n) if n is reachable 
in the final points-to escape graph and should be present 
in that graph. These constraints start with a set of nodes 
that will be mapped into the combined graph. They then 
trace out the reachable nodes to determine the complete 
set of nodes that should be present in the combined graph. 
Figures 10 and 11 present an algorithm for solving the con- 
straint system in Figure 8. We highlight several properties 
of this algorithm: 


e Base Nodes: The analysis starts with a set of base 
nodes mapped into the combined graph. In the case of 
caller/callee interaction, all of the nodes from the caller 
are mapped directly into the combined graph. The 
class nodes from the callee are also mapped directly, 
while the parameter nodes are mapped indirectly to 
model the semantics of method invocation. 


e Inside Nodes: An inside node from one of the parallel 
interaction graphs is present in the combined graph if 
it is reachable from the base nodes. 


e Inside Edges: If two nodes are mapped into the com- 
bined graph, the analysis uses the mapping to translate 
insided edges between the two nodes into the graph. 


e Load Nodes: A load node is mapped into the com- 
bined graph if it is reachable from an escaped base 
node. 


e Outside Edges: Outside edges are translated into 
the combined graph if the node that they come from 
is mapped into the graph and at least one of the nodes 
that it maps to is escaped in the graph. 


8.0.1 Constraint Solution Algorithm 


We next discuss the constraint solution algorithm in Fig- 
ures 10 and 11 for the constraint system in Figure 8. The al- 
gorithm directly reflects the structure of the inference rules. 
At each step it detects an inference rule antecedent that be- 
comes true, then takes action to ensure that the consequent 
is also true. 


Constraint 


pu(n) C p(n) 


((n1, £), n2) € Oi, ((ng,£), na) € L, n3 € TAC) 


Existing Edges 
and Mapping 


na € p,(n2) 


((ni, £), n2) ei, ((ns, £), m4) E O;; n3 € pi(m1) 


ne € p(n) 


(1) 
f 
nN3 > n4 
(2) yt 
f 
N1-----4 > nN2 
f 
N3-----9 > 14 
(3) yt 
f 
ni > N2 


Figure 7: Constraints for Matching Inside and Outside Edges 


Constraint 


p(n) S p(n) (4) 


((ni,f),n2) € Ii,n € wi(m1),n2 € Nr U Nr 


(5) 


nz € (m2) 


({n1,£),n2) € Oi,n € 44(n1), 
escaped((O’, I’, e’, 0), n) (6) 
ne € Hinz), ((n, £), n2) €O 


((n1, f), n2) Eel; 
(ui(m) x {£}) x pwi(n2) CT’ (7) 


n € ex(m1), no € pu,(n1) 
n € (na) (8) 


m,f),n2) € OUT’ 
i ie 2 Tas) (9) 


—- existing inside edge 


we > existing outside edge 
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Figure 8: Constraints for Combining Points-to Escape Graphs 
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The mapNode(n1, 7,7) procedure in Figure 10 is invoked 
whenever the algorithm maps a node n; from points-to es- 
cape graph 7 to a node n in the new graph. It first matches 
inside edges involving m1 in points-to escape graph 7 to inside 
edges involving n in the new graph. The procedure checks 
any edges to n1 that have already been previously translated 
into the new graph to see if they should also be translated to 
point to n in the new graph. It also checks all of the inside 
edges from n to see if they should be translated into the 
new graph. The procedure then checks outside edges from 
mi to see if they should be translated into the new graph. 
Finally, the procedure updates the new escape function e’ 
to reflect the effect of the newly mapped nodes and newly 
translated edges. 

Figure 11 presents the driver for the constraint solution 
algorithm. It maintains a worklist W/ of inside nodes from 
graph i that should be mapped into the new graph, and 
a worklist W, of outside edges that should be translated 
into the new graph if the edge’s source node is escaped in 
the new graph. When the algorithm processes a node from 
W/, it calls mapNode to map that node into the new graph. 
When the algorithm processes an outside edge from W2, it 
translates the edge into the new graph and maps the edge’s 
target node into the new graph. 

There is aslight complication in the algorithm. As the al- 
gorithm executes, it periodically translates inside edges into 
the new graph. Whenever a node n; from graph 7 is mapped 
to n, the algorithm translates each inside edge ((ni, £),n2) 
from graph 7 into the new graph. This translation process in- 
serts a corresponding inside edge from n to each node that 
m2 maps to; ie., to each node in y;(n2). The algorithm 
must ensure that when it completes, there is one such edge 
for each node in the final set of nodes p1,(n2). But when the 
algorithm first translates ((ni1,£),n2) into the new graph, 
L;(n2) may not be complete. In this case, the algorithm 
will eventually map nz to more nodes, increasing the set of 
nodes in ju;(n2). There should be edges from n to all of the 
nodes in the final p(n2), not just to those node that were 
present in p;(n2) when the algorithm mapped ((n1, £),n2) 
into the new graph. 

The algorithm ensures that all of these edges are present 
in the final graph by building a set 6;(n2) of delayed actions. 
Each delayed action consists of a node in the new graph and 
a field in that node. Whenever the node nz is mapped to a 
new node n’ (i.e., the algorithm sets y;(n2) = y;(n2)U{n'}), 
the algorithm establishes a new inside edge for each delayed 
action. The new edge goes from the node in the action to 
the newly mapped node n’. These edges ensure that the 
final set of inside edges satisfies the constraints. 


9 Interprocedural Analysis 


The interprocedural analysis algorithm propagates parallel 
interaction graphs from callees to callers. At thread start 
sites, the analysis adds the nodes that represent the started 
thread to the parallel thread map. It also marks the started 
thread nodes as escaping via themselves and propagates the 
escape information. At each method invocation site, the 
analysis has the option of either skipping the site or analyz- 
ing the site. If it skips the site, it marks all of the parameters 
and the return value as permanently escaping down into the 
site. 


mapNode(n1, n, 2) 


Figure 10: Algorithm for Mapping One Node to Another 


if (n1,n,2) ¢ D then 
D=DU {(m1, 7,4) } 
i (m1) = pj (m1) U {n} 
Add delayed inside edges for rule 7 
T=I'U6;(m1) x {n} 
S={n' € Nn’, £) € 6(m1)} 
Check inside edges for rules 5 and 7 
for all ((ni,f),n2) € I; 

Add inside edges for rule 7 

1 =T'U{(n,£)} x pi (na) 

S=SU {n} 

6i(n2) = 6; (m2) U {(n, £)} 

Check conditions for rule 5 

if n2 € N; UNr 

wi => wi U {n2} 
Check outside edges for rule 6 
for all ((ni,£),n2) € O; 

WP =WP U {((n,£),n2)} 
Update escape information for rules 8 and 9 
e'(n) = e'(n) Vex (n1) 

e’ = propagate((O’" I’, e’,r’), S) 


combine((O1, Ii, e1, 71), (O2, I2, €2, 72), f1, H2) 


Figure 
Graphs 


Initialize worklists and results 
D=9 
for i =1,2 do 
(Wi, Ww?) = (0, 0) 
for all n € N do 6;(n) = 
for alln € N do p(n) = 
(0, oe #) = (0, 0, 0) 
for alln € N do e'(n) =9 
Call mapNode for existing mappings 
for all (ni,n,7) such that n € pi(ni) do 
mapNode(n1, n, 7) 
done = false 
do 
if choose n € W; then 
Wi = Wi) — {n} 
mapNode(n, n, 2) 
else if choose ((n,£),n1) € WP such that 
escaped((O’, I’, e’,0),n) then 
Wi = Wi — {({n, £),m1)} 
Update outside edges for rule 6 
O! = O'U{((n,£),m)} 
mapNode(n1, 71,2) 
else done = true 
while not done 
return ((O', I, es r’), Mi, M2) 


) 
) 


11: Algorithm for Combining Points-to Escape 


9.1 Thread Start Sites 


To simplify the presentation of the analysis, we assume that 
at each thread start site l.start(), [(1) C Nr. This is al- 
most invariably the case in practice — most threads are 
started in the same method in which they are allocated. 
The algorithm adds the thread nodes that 1 points to into 
the set of parallel threads. It also makes the escape function 
for each node nr € I(1) include nr, and propagates the new 
escape information. 


1 = t(nr) @1. ifnr € I(1) 
(nr) = { T(nr) otherwise 
= n)U{n} ifn e I(1) 
er(n) = e(n) otherwise 
e= propagate((O, I, er,r),I(1)) 


ia) 


9.2 Skipped Method Invocation Sites 


The transfer function for a skipped method invocation site is 
defined as follows. Given a skipped method invocation site 
m of the form 1 = 1o.op(11,...,1x) with return node nr 
and a current parallel interaction graph ((O,I,e,r),7, a, 7), 
the parallel interaction graph 

(O'",I’,e',1'), 7’, a", 2’) = [m] (((O, I, e,r), 7, a, 7)) after the 
site is defined as follows: 


I’ = (I — edges(I,1)) U {(1,nr)} 


Oo’ =O 

e’ = propagate((O’, I’,em,r), Sm) 
y 

r= 


where Sm = U{I(1;).0 <i < k} and 


ire e(n)U{m} ifne Sy orn=nr 
Cm ela) otherwise 


Recall that the return node ne is an outside node used to 
represent the return value of the invoked method. 


9.3. Analyzed Method Invocation Sites 


Given an analyzed method invocation site m and a current 
parallel interaction graph ((O,I,e,r),7,a,7), the new par- 
allel interaction graph 

(O'»1',e,r'),7 0,2") = [m](((O, 1, ¢,r),7,0,7)) after the 
site is defined as follows: 


(O'>l, e’, r'), cae a’, n’) = 
LI {mapUp(((O, I, e,r), 7, a, 7), m, op).op € callees(m) } 


Figure 12 presents the combination algorithm, which per- 
forms the following steps: 


e It retrieves the analysis result from the exit statement 
of the invoked method op. 


e It builds an initial mapping. This mapping maps the 
parameter nodes from the callee to the corresponding 
nodes in the caller that represent the actual parame- 
ters, and the class nodes to themselves. 


e It uses the initial mapping to match outside edges 
in the callee to the corresponding inside edges in the 
caller. The result is a mapping from the outside nodes 
of the callee to the corresponding nodes in the caller. 


e It uses the result mapping to combine the caller and 
callee graphs, generating the new parallel interaction 
graph. In addition to combining the points-to and es- 
cape information, the analysis must also translate the 
actions and parallel threads from the callee into the 
caller. It must also record the fact that all of the ac- 
tions from the callee occur in parallel with all of the 
parallel threads from the caller. 


The transfer function itself merges the combined results 
from all potentially invoked methods to derive the points-to 
escape graph at the point after the method invocation site. 


mapUp(({O, I, e, r),T, Qa, T), m™m, op) 
Assume m of the form 1 = 1lo.op(11,...,1x) 
Assume op has formal parameters p,,..-,p, 
Extract the analysis result from the invoked method 
((Or, In, er,1R), TR, AR, TR) = G(exitope) 
Compute initial mappings 
n} ifne No 
pela { i : otherwise 
{n} ifne No 
Mep(n) = 4 Th) ifn=np, 
) otherwise 
Match outside nodes from callee to caller nodes 
(1 ’ p2) = match((9, ibs e, r), (Or, Tr, eR; rR), Lo, Hop) 
Compute mappings for combination 


pui(n)U{n} if trR(n) > 0 orn Er or 
b(n) = dv EV.(v,n) ET 
p(n) otherwise 
fio(n) U{n} if tr(n) > 0 or 
pR(n) = n€rr—(NpUNz) 
p2(n) otherwise 


ep(n) = er(n) —P 
Combine points-to escape graphs 
((O', Ic, e3 rc), Hh, H>) = 
combine((O, T,e, r), (Or, Tr, eR, rR); b, LR) 
Generate the final set of inside edges and return set 
I’ = Io U(I ~edges(1)) U(x U ah(n)) 
nerTR 
r=r 
Compute new thread map 
t'(n) =7(n) @ Tr(n) 
Compute action mapping from callee to caller 
_ J {b} x p(n) if a = (b,n) 
pa(a) = { {bo} x po(n) x {nr} ifa=(b,n,nr) 
Combine actions from caller and callee 
a! =aU(U pa(a)) 
at€aR 
Compute new parallel action relation 
w=nrU( U pa(a) x {nr})U 
(anr err 
(U pala) x {rr-r(nr) > 0}) 
acaR 
Return the new parallel interaction graph 
return ((O’,I',e’,r’),7',a',7') 


Figure 12: Callee/Caller Interaction Algorithm 


9.4 Fixed-Point Analysis Algorithm 


Figure 13 presents the interprocedural fixed-point algorithm 
that the compiler uses to generate the interprocedural anal- 


ysis results. It uses a worklist of pending statements to solve 
the combined intraprocedural and interprocedural dataflow 
equations. At each step, it removes a statement from the 
worklist and updates the analysis results before and after 
the statement. If the analysis result after the statement 
changed, it inserts all of its successors (or for exit nodes, 
all of the callers of its method) into the worklist. As spec- 
ified, the algorithm is therefore both intraprocedural and 
interprocedural. 


Initialize analysis results 
for all st € ST do 
B(est) xe G(ste) me (4, 8, eg, 0), T0; 0, )) 
where eg(n) = @ for all n € N and 
To(nr) = 0 for all nr € Nr 
for all op € OP do 
G(eenterop) = ((Oo, Io, €0, To); To, Xo, To) 
Initialize the worklist 
We= {enterop.op € oP} 
while (W 4 0) do 
Remove a statement from worklist 
W =W — {st} 
Process the statement 
Blest) = LI{GB(st’e).st’ € pred(st)} 
B(ste) = [st] (a(3st)) 
if B(ste) changed then 
Put potentially affected statements 
onto worklist 
W =W Usucc(st) 
if st = exitop then 
W =W Ucallers(op) 


Figure 13: Fixed-Point Analysis Algorithm 


The order in which the algorithm analyzes methods can 
have a significant impact on the analysis. For non-recursive 
methods, a bottom-up analysis of the program yields the full 
result with one analysis per method. For recursive methods, 
the analysis results must be iteratively recomputed within 
each strongly connected component of the call graph us- 
ing the current best result until the analysis reaches a fixed 
point. 

It is possible to extend the algorithm so that it initially 
skips the analysis of method invocation sites. If the analysis 
result is not precise enough, it can incrementally increase 
the precision by analyzing method invocation sites that it 
originally skipped. The algorithm will then propagate the 
new, more precise result to update the analysis results at 
affected program points. 


10 Inter-thread Analysis 


The interprocedural algorithm described above generates a 
parallel interaction graph for every point in the program. 
This graph records all of the points-to relationships created 
by the current thread and all of the potential interactions 
of that thread with other threads. The thread interaction 
algorithm resolves the potential interactions by computing 
which interactions may actually occur between the current 
thread and the parallel threads that it (transitively) starts. 
The basic idea is to repeatedly match corresponding inside 
and outside edges from parallel threads. The result is a sin- 
gle parallel interaction graph that summarizes the combined 


effect of the parallel threads on the points-to and escape in- 
formation at that point. 


10.1 Thread Interaction 


We next discuss the interaction algorithm for parallel threads. 
The interaction takes place between a starter thread (a thread 
that starts a parallel thread) and a startee thread (the thread 

that is started). The interaction algorithm is given the par- 

allel interaction graph ((O,1,e,r),7,a,7) from the starter 

thread, a node nr that represents the startee thread, and a 

run method op with receiver object py that runs when the 

thread object represented by nv starts. 


((O'" as e’, r’), ae a’, n’) =~ interact (((O, I, e, r), T, a, TT), NT, op) 


Figure 14 presents the interaction algorithm. The algorithm 
performs the following steps: 


e It extracts the final analysis result for the startee thread. 
This is the analysis result after the exit node of the 
startee thread’s run method. 


e It matches corresponding inside and outside edges from 
the two threads. All of the outside edges from the star- 
tee thread participate in the matching. Outside edges 
from the starter participate only if they represent loads 
that may have occurred after the startee thread began 
its execution. The algorithm uses the event ordering 
information to determine which of these outside edges 
should participate. 


e It combines the two parallel interaction graphs to gen- 
erate the final parallel interaction graph. In addition 
to combining the points-to and escape information, the 
analysis must also translate the actions and started 
threads from the two graphs into the combined graphs. 
It must also update the ordering information as fol- 
lows: 


— Each thread has ordering information that spec- 
ifies which of its actions may execute in parallel 
with the threads that it starts. For both threads, 
this ordering information is translated into the 
new points-to escape graph. 


— All actions from the starter thread that occur in 
parallel with the startee thread also occur in par- 
allel with all of the startee’s threads. 


— All actions from the startee thread occur in par- 
allel with all of the starter’s threads. 


Note that because the parallel interaction graphs rep- 
resent all potential interactions between threads, the algo- 
rithm can compute the actual interactions with a single pass 
over the two graphs. 


10.2 Resolution 


To generate the final parallel interaction graph that summa- 
rizes all of the interactions, the algorithm resolves all of the 
interactions between the parallel threads. The resolution 
algorithm repeatedly takes the current parallel interaction 
graph, chooses one of the startee threads, and computes 
the interactions between the current graph and the startee 
thread to derive a new current graph. It continues this pro- 
cess until it reaches a fixed point. In the absence of loop 
or recursively generated concurrency, the algorithm reaches 
a fixed point after processing each thread once. The result 


interact(((O, I,e,r),7, a, 7), nT, op) 
Assume py represents receiver of op 
Extract the analysis result for the parallel thread 
((Or, Tr, eR; rR), TR,QR, TR) = v(exitope) 
Compute outside edges from starter thread that 
participate in mapping 
Ont ae {((n1, £), n2).( (14, n2), nr) € 7m or 
((1d, n2, n), nr) € Th 
Compute initial mappings for the match 
{n} ifne No 


e(n) = i) otherwise 
{n} ifne No 
Hop (n) = {nr} ifn = Np, 
0 otherwise 


Match corresponding inside and outside edges 
(1 ’ p2) = match((On, ’ I, e, r), (Or, Tr, eR, TR); Lo, Hop) 
Compute mappings for the combination 


wu(n) U{n} if r(n) > 0 or n€ r or 
p(n) = dv € V.(v,n) ET 
p(n) otherwise 
a(n n} if TR(n 0 
br(n) = { i i aa ee 
éR (n) = Er(n) —P 


Combine the two parallel interaction graphs 
(O'" Ic, e, re), Hi, H5) a 
combine((O, I, e, r), (Or, Tr, CH. TR); Lb, UR) 
'=IcVU U_  {v} x pin) 
(v,n)er 
"=U nr) 
ner 
Compute action mappings for combination 


uA(a) = {b} x Ke (n) : if a = (b,n) ? 
1 {b} x p(n) x {np} if a = (b,n, np) 
pA(a) = {b} x Hz (n) , if a = (b,n) : 
2 {b} x po (n) x {nip} if a = (b,n, n'y) 
Compute new parallel thread map 
tT(n) el ifn =nr 
Tag (n) = T(n) ®TR(n) otherwise 


T'(n) = Trp (n) @ TR(n) 
Compute combined action sets 


a’ = (U wit@)u 
aca 
{b} x p(n) x {nr})U 


( U 
(b,n)eaR 
( U {bd} x pet(n) x {n'p}) 


(b.n,np)eoR 
Compute new parallel action relation 


m=( Uo wi(a) x {nir})U 
ut (a) x {n’p-7R(n’p) > OF)U 


13(a) x {n'p})U 
(a,n'lp) CFR 


( Une (a) x {n't (n'r) > 0}) 


acaR 
Return the new parallel interaction graph 
return ((O',I',e’,r’),7', a’, 7’) 


Figure 14: Parallel Thread Interaction Algorithm 


of the resolution algorithm resolve((G,7, a, 7)) must satisfy 
the following equation: 


resolve((G, 7, a,7)) = 
{resolve(interact((G, ™m,,@,7),n7, op))} 
(np ,opyes 


where 
e S={(nr, op).r(nr) > 0 and op € run(nr)} 


ifn=nr 
otherwise 


+ tana) = { 7 2? 


To perform the resolution, the analysis requires informa- 
tion about the correspondence between thread nodes and 
the run method that executes when the start method is in- 
voked on an object that the node represents. Given a thread 
node n € Nr (a node that represents runnable objects), 
run(n) is the set of run methods that may execute when the 
start method is invoked on an object represented by n. In 
the current compiler, run(n) is computed using the declared 
type of n. Figure 15 presents a fixed-point algorithm that 
computes resolve((G, 7, a, 7)). 


resolve ((G, T, a, 77)) 
(G',7',a', 7’) —_ (G,7,a,7) 


while there exists nr € Nr such that 
t'(nr) > 0 and (nr,7', a’, 7’) ¢ S do 
S=SU{(nr,7',a',7')} 
choose nr such that r'(n 
(G',7',a’, 7) == | | 
operun(nr) 
if G’ changed then S = 0 
return((G’, 7’, a’, 7’)) 


Figure 15: Fixed-Point Resolution Algorithm 


Given a statement st, it is possible to compute a sin- 
gle parallel interaction graph ((O,I,e,r),7,a,7) that com- 
pletely summarizes the points-to and escape relationships 
after st as follows: 


((O, I,e,r),7,a,7) = trim(resolve(G(ste))) 
where 
e S={ne€ Nz.e(n) — Nr = 9} 


e trim(((O,I,e,r),7,a,7)) = 
remove(((O, I, e’,r), 7,0, 7), S) 


e e'(n) =e(n) — Nr. 


The algorithm trims off any outside edges that come 
from nodes that escape only because they are reachable from 
thread nodes. It also removes all thread nodes from the es- 
cape function. The resolved graph already contains all of 
the possible interactions that may affect nodes that escape 
only via other threads. 

For the program point at the exit of the main method, 
the analysis may also trim off outside edges that come from 
nodes that escape only because they are reachable from 
static class variables. The resolved graph at this program 
point contains all of the possible interactions that may affect 


rT) > Oand (nr,7',0',n') ES 
interact((G’, 7),,.,a’,7'), nT, op) 


these nodes. In this graph, the only source of uncertaintly 
comes from nodes passed into or returned from unanalyzed 
method invocation sites. 


((O, I,e,r),7,a, 7) = trim Main(resolve(G(exitmain®))) 


where 
e S={ne€ Ny.e(n) —(NrU No) = 9} 


e trim(((O,I,e,r),7,a,7)) = 
remove(((O,I,e’,r), 7, a, 7), S) 


e e'(n) =e(n) — (Nr UNC). 


11 Independence Testing 


The independence testing algorithm finds objects that are 
captured at the end of a method, using either the resolved 
graph (as discussed in Section 10.2) or the single thread in- 
terprocedural analysis result (as discussed in Section 9). If 
an object is captured in either graph, it is unreachable out- 
side the method. In this case, the graph completely sum- 
marizes all of the actions that threads may perform on the 
object. The algorithm uses the parallel action relation to 
determine if two conflicting actions may occur concurrently. 
If not, the actions are independent and the compiler can 
eliminate all synchronization on the objects that the node 
represents. Given a parallel interaction graph (G,7, a, 7) 
from the exit node of a method and a captured node n, 
the algorithm in Figure 16 tests if all of the synchronization 
actions on n are independent. 


independent ((G, T, a, 7), 7) 
Check if the node is escaped in G 
if escaped(G,n) then return false 
Find all sync actions on n 
S = {(sync,n) € a} U {(sync, n, nr) € a} 
Check if one of the actions a in S may execute in 
parallel with a thread nr that also synchronizes on n 
if da € Snr € Nr.{a,nr) € 7 and (sync,n,nr) € S then 
return false 
else return true 


Figure 16: Independence Testing Algorithm 


The compiler tests all inside nodes for independence in 
the analysis result G(exitope) for all methods op. It also 
tests all inside nodes in the resolved analysis result 
trim (resolve(G(exitope))) for methods op that contain thread 
creation sites and in the resolved analysis result 
trimMain(resolve(G(exitmain®))) at the end of the main 
method. 


12 Resolving Outside Nodes 


It is possible to augment the algorithm so that it records, 
for each outside node, all of nodes that it represents dur- 
ing the analysis. This information allows the algorithm to 
go back to the analysis results generated at the various pro- 
gram points and resolve each outside node to the set of inside 
nodes that it represents during the analysis. The basic idea 
is to generate a set of inclusion constraint systems. There is 
one system 2, for each method invocation site m and one 
system Qop for each method op. These systems specify, for 


each node n, a map set w(n) of nodes that n is mapped to 
during the analysis of the corresponding method invocation 
site or method. These systems are specified using set inclu- 
sion constraints of the forms w(n1) C w(n2), which specifies 
that the map set for nz includes the map set for n1. We use 
the notation wm(n) to indicate the solution of the constraint 
system Q for n, and similarly wop(n) for the solution of 
Qop for n. The initial map set Qo consists of the set of con- 
straints {n} C w(n), which specifies that each node is in its 
map set. The constraint systems can be solved by a simple 
constraint propagation algorithm [24]. 

We define the constraint systems using the mappings 
generated by the algorithms in Figures 12 and 14. Specifi- 
cally, ugp(m) = p(n), where py is the mapping computed 
by the algorithm in Figure 12 for the method op invoked 
at method invocation site m, and pop(n) = p(n) U po(n), 
where py, and p5 are the mappings computed by the algo- 
rithm in Fgiure 14 when applied to the parallel interaction 
graph at the program point exitope. The constraint system 
at a method invocation site m is defined as follows: 


Qm = {w(n1) C w(n2).ni € pop (n2)} U U Qop 
opecallees(m) 


The constraint system is considered to be final for a node 
n if it completely summarizes all of the inside nodes that n 
can represent during the analysis of the method invocation 
site. To determine if the constraint system is final for n, 
the algorithm checks to see that all of the outside nodes 
that n represented during the analysis have been completely 
resolved to inside nodes. Formally, 


final(m,n) = Vn! € wm(n), op € callees(m).pop(n')NNo = 0 


For each method op, the analysis can choose whether 
it wishes to compute the interactions between threads to 
resolve outside nodes. The advantage of doing so is a poten- 
tial increase in the precision; the disadvantage is a potential 
increase in the analysis time. If the analysis does not com- 
pute the interactions between threads, Qop and final(op, n) 
are defined as follows: 


Mp = MU U Om 
melnvocations(op) 
final(op,n) = Vm € invocations(op).final(m,n) 
P 


Here invocations(op) is the set of all method invocation sites 
in op. Ifthe analysis does compute the interactions, Qop and 
final(op, n) are defined as follows: 


Qop = {w(ni) Cw(n2).ni € pop(n2)}U 
QU U Qin 
melnvocations(op) 
final(op,n) = Vn' € wop(n).pop(n’)M No = 9 


The analysis can mix and match these two approaches on a 
per-method basis; an appropriate policy is to compute the 
interactions only for methods that start threads. 

If a node is final in a given constraint system, the analysis 
has determined all of the inside nodes that it represents dur- 
ing the analysis of the corresponding method invocation site 
or method. More formally, if final(m,n), then wm(n)MN7 is 
the set of inside nodes represented by n during the analysis 
of the method invocation site m. Similarly, if final(op,n), 
then wop(n) M Nz is the set of inside nodes represented by 
n during the analysis of the method invocation site op. 


13. Abstraction Relation 


In this section, we characterize the correspondence between 
parallel interaction graphs and the objects and references 
created during the execution of the program. A key property 
of this correspondence is that a single concrete object in the 
execution of the program may be represented by multiple 
nodes in the parallel interaction graph. We therefore state 
the properties that characterize the correspondence using an 
abstraction relation, which relates each object to all of the 
nodes that represent it. 

As the program executes, it creates a set of concrete 
objects o € C and a set of references r € R C (Vx C)U((C x 
F) x C) between objects. At each point in the execution of 
the program, it is possible to define the following sets of 
references and objects: 


e Rc is the set of references created by the current ex- 
ecution of the current method and all of the analyzed 
methods that it invokes. 


e Rr is the set of references read by the current exe- 
cution of the current method and all of the analyzed 
methods that it invokes. 


e Cpr is the set of objects reachable from the local vari- 
ables, static class variables, and parameters by follow- 
ing references in Ro U Rr. 


e Rr =RoN ((Cr xX F x Cr) U(V x Cr)) is the set of 
inside references. These are the references represented 
by the set of inside edges in the analysis. 


e Ro = (RrN(Cr x Fx Cr)) — Rr is the set of outside 
references. These are the references represented by the 
set of outside edges in the analysis. 


It is always possible to construct an abstraction relation 
p ©CXN between the objects and the nodes in the parallel 
interaction graph ((O, I,e,r),7,a,7) at the current program 
point. This relation relates each object to all of the nodes in 
the points-to escape graph that represent the object during 
the analysis of the method. The abstraction relation has all 
of the properties described below. 


e Reachable objects are represented by their allocation 
sites. If o was created at an object creation site within 
the current execution of the current method or ana- 
lyzed methods that it invokes, and o is reachable (i.e. 
o € Cr), n € p(o), where n is the object creation site’s 
inside node. 


e Each object is represented by at most one inside node: 
— ni, n2 € p(o) and n1,n2 € Nz implies ni = n2 


e All outside references have a corresponding outside 
edge in the points-to escape graph: 
— ((cl,f),0) € Ro implies O(cl,£)/N p(o) 4 4 
— ((o1,f),02) € Ro implies 
(p(o1) x {£}) x plo2) NO #0 


e All inside references have a corresponding inside edge 
in the points-to escape graph: 
— (v,o) € Rr implies I(v) N p(o) 49 
— ((cl,£),0) € Rr implies I(cl,£)N p(o) 44 


— ((01,£),02) € Rr implies 
(p(o1) x {£}) x pla2)NI AD 


e If an object is represented by a captured node, it is 
represented by only that node: 


— n€ p(o) and captured((O, I,e,r),n) implies 
plo) = {n} 


Given this property, we define that an object is cap- 
tured if it is represented by a captured node. All ref- 
erences to captured objects are either from local vari- 
ables or from other captured objects: 


— n€ p(o), captured((O,I,e,r),n), and (v,o) E R 
implies v € L 
— ne € p(o2), captured((O, I,e,r),n2), and 


((01,£),02) € R implies dni € N.p(o1) = {ni} 
and captured((O, I, e,r),n1) 


These properties ensure that captured objects are reach- 
able only via paths that start with the local variables. 
If an object is captured at a method exit point, it will 
therefore become inaccessible as soon as the method 
returns. 


e The points-to information in the points-to escape graph 
completely characterizes the references between ob- 
jects represented by captured nodes: 


— captured((O, I,e,r),n1), captured((O,I,e,r), n2), 
mi € p(o1),n2 € p(oz) and ((ni,£),n2) ¢ I implies 
((01,f), 02) ad R 


14 Experimental Results 


We have implemented a combined pointer and escape anal- 
ysis based on the algorithm described in this paper. We 
implemented the analysis in the compiler for the Jalapeno 
JVM [8], a Java virtual machine written in Java with a few 
unsafe extensions for performing low-level system operations 
such as explicit memory management and pointer manipu- 
lation. 

The analysis is implemented as a separate phase of the 
Jalapeno dynamic compiler, which operates on the Jalapeno 
intermediate representation. To analyze a class, the algo- 
rithm loads the class, converts its methods into the interme- 
diate representation, then analyzes the methods. The final 
analysis results for the methods are written out to a file. 
This approach provides excellent support for dynamically 
loaded programs. It allows the compiler to analyze a large, 
commonly used package such as the Java Class Libraries 
once, then reuse the analyze results every time a program is 
loaded that uses the package. It also supports the delivery 
of preanalyzed packages. Instead of requiring the analysis 
to be performed when the package is first loaded into a cus- 
tomer’s virtual machine, a vendor could perform the analysis 
as part of the release process, then ship the analysis results 
along with the code. 

Our benchmark set includes four programs: javac (Java 
compiler), javacup (parser generator), server (a simple mul- 
tithreaded web server), and work (a compute benchmark 
with multiple worker threads). Figure 17 presents the total 
number of synchronizations required to execute each pro- 
gram. We report counts for three different optimization lev- 
els: 


e Original: No analysis is performed. 


e Interprocedural: The compiler uses the interpro- 
cedural, single-threaded analysis results as defined in 
Section 9. At the end of each method, it finds all cap- 
tured nodes and removes all synchronization on the 
corresponding objects from the counts. 


e Interthread: The compiler uses the inter-thread anal- 
ysis as defined in Section 10. In addition to the In- 
terprocedural optimization described above, the com- 
piler uses the thread interaction results. At the end of 
each method that starts a thread, and at the end of 
the main method, it resolves the interactions between 
started parallel threads. If all of the synchronization 
actions on a captured node in the resulting parallel in- 
teraction graph are independent, the analysis removes 
all synchronization on the corresponding objects from 
the counts. 


Application 


Original Interprocedural Interthread 


javac 2,080,116 1,348,814 51,164 
javacup 1,704,563 537,040 121,798 
server 7,091 6,123 1,842 
work 21,877 21,317 2,983 


Figure 17: Total Number of Synchronization Operations 


For javac and javacup, the inter-thread optimization re- 
moves over 92% of the total synchronizations. For server 
and work, the inter-thread synchronization removes over 
74% of the synchronizations. In all of the cases, the In- 
terthread optimization significantly reduces the number of 
synchronizations as compared to the Interprocedural opti- 
mization. To put these results in perspective, recent research 
with escape analysis algorithms less powerful than our Inter- 
procedural optimization level reported significant speedups 
from synchronization elimination for a range of Java pro- 
grams [12, 3]. 


15 Related Work 


In this section, we discuss several areas of related work: 
pointer analysis, escape analysis, and synchronization op- 
timizations. 


15.1 Pointer Analysis for Multithreaded Programs 


There have been, to our knowledge, two previously pub- 
lished flow-sensitive pointer analysis algorithms for multi- 
threaded programs. Rugina and Rinard published an algo- 


rithm for programs with structured, fork-join parallelism [37]. 


The algorithm is interprocedural, context-sensitive, and top- 
down, generating calling contexts in a top-down manner 
starting with the main method. Each procedure is rean- 
alyzed for each new calling context. Corbett published a 
algorithm for multithreaded programs that consist of a sin- 
gle procedure [16]. Both analyses use an iterative, fixed- 
point algorithm to compute the interactions between paral- 
lel threads, and must analyze the entire program. Our al- 
gorithm, on the other hand, is a bottom-up, compositional, 
interprocedural algorithm that analyzes each method once 
to derive a parameterized analysis result that can be spe- 
cialized for use at all call sites that invoke the method.® 
Unlike Corbett’s algorithm, it handles multiple procedures 


’Recursive methods require an iterative algorithm that may ana- 
lyze methods multiple times to reach a fixed point. 


and recursively generated concurrency. Unlike Rugina and 
Rinard’s algorithm, it handles programs with unstructured 
multithreading. 

Rugina and Rinard’s algorithm propagates information 
with three sets of edges: the current set of edges C, inter- 
ference edges J from other threads, and the set of edges E 
created by the current thread. Separating C and FE enables 
the algorithm to perform strong updates to shared variables. 
Strong updates eliminate edges from C, leaving them in E to 
be correctly observed by other threads. There are two ways 
to extend the algorithm presented in this paper to handle 
strong updates to heap allocated objects. The first is to 
allow the analysis to perform strong updates to captured 
objects [41]. The second is to adopt the Rugina and Rinard 
solution and split the set of inside edges in the parallel inter- 
action graphs into a set of current edges and a set of edges 
created by the current thread, with strong updates remov- 
ing edges from the set of current edges but leaving them in 
place in the set of edges created by the current thread. 

The interference edges in Rugina and Rinard’s analy- 
sis correspond to inside edges from parallel threads in the 
analysis presented in this paper. In general, the set of in- 
terference edges coming into a current thread depends on 
interactions between that current thread and threads that 
run in parallel with it. Rugina and Rinard’s analysis uses a 
fixed-point algorithm to resolve these interactions and com- 
pute a complete set of interference edges for the analysis of 
each thread. This algorithm repeatedly reanalyzes threads 
until the sets of interference edges from parallel threads do 
not change. The analysis presented in this paper takes a 
different approach. It uses outside edges and nodes to rep- 
resent all potential interactions of the current thread with 
other parallel threads. These outside edges and nodes en- 
able the analysis to conceptually derive a complete set of in- 
terference edges from parallel threads without a fixed-point 
algorithm. Instead, the analysis matches inside and outside 
edges to compute the interactions without reanalyzing each 
thread. 

In general, the analysis of multithreaded programs is a 
relatively unexplored field. There is an awareness that mul- 
tithreading significantly complicates program analysis [31], 
but a full range of standard techniques have yet to emerge. 
Grunwald and Srinivasan present a dataflow analysis frame- 


work for reaching definitions for explicitly parallel programs [25], 


and Knoop, Steffen and Vollmer present an efficient dataflow 
analysis framework for bit-vector problems such as liveness, 
reachability and available expressions, but neither frame- 
work applies to pointer analysis [29]. In fact, the application 
of these frameworks for programs with pointers would re- 
quire pointer analysis information. Zhu and Hendren present 
a set of communication optimizations for parallel programs 
that use information from their pointer analysis; this analy- 
sis uses a flow-insensitive analysis to detect pointer variable 
interference between parallel threads [44]. Hicks also has 
developed a flow-insensitive analysis specifically for a mul- 
tithreaded language [27]. 


15.2 Escape Analysis for Multithreaded Programs 


There have been, to our knowledge, four previously pub- 


lished escape analysis algorithms for multithreaded programs [41, 


10, 12, 15]. All of these algorithms use the escape informa- 
tion for stack allocation and synchronization elimination. 
They only analyze single threads, and are designed to find 
objects that are accessible to only the current thread. If an 
object escapes the current thread, either to another thread 


or by being written into a static class variable, it is marked 
as globally escaping, and there is no attempt to recapture 
the object by analyzing the interactions between the threads 
that access the object. These algorithms are therefore fun- 
damentally sequential program analyses that have been ad- 
justed to ensure that they operate conservatively in the pres- 
ence of parallel threads. The algorithm presented in this 
paper, on the other hand, is designed to analyze the inter- 
actions between parallel threads. Unlike all other previously 
published algorithms, it is capable of extracting precise es- 
cape information even for objects that are accessible to mul- 
tiple threads. 


15.3 Pointer Analysis for Sequential Programs 


Pointer analysis for sequential programs is a relatively ma- 
ture field. Flow-insensitive analyses, as the name suggests, 
do not take statement ordering into account, and often use 
an analysis based on some form of set inclusion constraints 
to produce a single points-to graph that is valid across the 
entire program [4, 40, 39]. Many flow-insensitive algorithms 
scale well to very large programs, in part because they gen- 
erate one analysis result instead of one per program point 
and in part because of highly optimized implementations 
of the inclusion constraint solution algorithms [24]. Because 
flow-insensitive analyses are insensitive to the order in which 
statements execute, they model all interleavings and extend 
trivially to multithreaded programs. Like many flow insen- 
sitive algorithms, we use set inclusion constraints as a fun- 
damental tool in our analysis. A difference is that our anal- 
ysis uses these constraints to formally specify the result of 
interactions between parallel interaction graphs during the 
interprocedural and inter-thread analyses, with each interac- 
tion generating its own constraint solution problem. The in- 
traprocedural analysis uses a standard fixed-point dataflow 
approach. Flow-insensitive analyses typically formulate the 
entire analysis problem as a single collection of set inclusion 
constraints. 

Flow-sensitive analyses take the statement ordering into 
account, typically using a dataflow analysis to produce a 
points-to graph or set of alias pairs for each program point [38, 
35, 42, 23, 14, 30]. One approach analyzes the program in a 
top-down fashion starting from the main procedure, reana- 
lyzing each potentially invoked procedure in each new calling 
context [42, 23]. Another approach analyzes the program in 
a bottom-up fashion, extracting a single analysis result for 
each procedure. The result is reused at each call site that 
may invoke the procedure [38, 13]. Our algorithm builds on 
these previous approaches. It is extended to include escape 
and action ordering information and to explicitly represent 
potential interactions using outside edges and nodes. These 
extensions enable the algorithm to generalize in a straight- 
forward way to model interactions between parallel threads. 

Multithreading introduces one particulary subtle point. 
Consider a load of a reference from an inside node, which 
represents an object created within the computation of the 
currently analyzed method. In a sequential program, the 
load would always return a reference created within the anal- 
ysis of the current method — because the inside node did 
not exist before the method was invoked, no unanalyzed 
code could have executed to write a reference into the ob- 
ject. But in a multithreaded program, an unanalyzed par- 
allel thread may write references into an object as soon as 
it escapes. The analysis for multithreaded programs must 
therefore assume that every load from an escaped object 
may access a reference created outside the current analysis 


scope. Our analysis deals with this possibility by using out- 
side edges and outside nodes to represent the results of loads 
from escaped objects. 

There is a similarity between the outside nodes in our 
analysis and invisible variables in previous analyses [30, 23, 
42]. Both outside nodes and invisible variables are used to 
represent objects from outside the analysis context during 
the analysis of a method or procedure. One difference is that 
when the analysis generates contexts in a top-down fashion, 
it has a complete characterization of the aliasing and points- 
to relationships involving all invisible variables. In this con- 
text, invisible variables primarily serve to enable the analysis 
to reuse analysis results for contexts with the same aliasing 
and points-to relationships between procedure parameters. 
Because our analysis is bottom-up, it knows nothing about 
the relationships involving objects represented by outside 
nodes. It therefore analyzes each method under the two as- 
sumptions that there are no aliases between outside nodes, 
and that every load from an escaped node may access a ref- 
erence created outside the method. One implication of this 
difference is that different invisible variables always repre- 
sent disjoint sets of objects, while different outside nodes 
may represent overlapping sets of objects. 

Invisible variables and outside nodes also support a par- 
ticular kind of precision in the analysis. Consider a method 
invoked at multiple call sites. It may be the case that an 
object is allocated inside the method, escapes the method, 
but is recaptured at each call site. In this case, the use 
of invisible variables or outside nodes enables the analysis 
to recognize that the object was recaptured. It can there- 
fore separate the different instantiations of the allocated ob- 
ject from each other in the analysis. To our knowledge, 
the only published analyses that can separate the different 
instantiations of the object are both flow sensitive and use 
some variant of the concept of invisible variables [30, 23, 42]. 
Context-insensitive analyses simply merge the information 
from the different call sites. In the absence of recursion, 
other context-sensitive analyses are capable of separating 
the different instantiations [32], but merge information from 
recursive call sites in a way that destroys the distinction be- 
tween multiple instantiations of the same variable in a re- 
cursive procedure [32]. A flow-insensitive, constraint-based 
analysis with polymorphic recursion may be able to sepa- 
rate the instantiations and recover this particular kind of 
precision. 


15.4 Escape Analysis 


There has been a fair amount of work on escape analysis 
in the context of functional languages [7, 5, 43, 18, 19, 9, 
26]. The implementations of functional languages create 
many objects (for example, cons cells and closures) implic- 
itly. These objects are usually allocated in the heap and 
reclaimed later by the garbage collector. It is often possible 
to use a lifetime or escape analysis to deduce bounds on the 
lifetimes of these dynamically created objects, and to per- 
form optimizations to improve their memory management. 

Deutsch [18] describes a lifetime and sharing analysis for 
higher-order functional languages. His analysis first trans- 
lates a higher-order functional program into a sequence of 
operations in a low-level operational model, then performs 
an analysis on the translated program to determine the life- 
times of dynamically created objects. The analysis is a 
whole-program analysis. Park and Goldberg [5] also describe 
an escape analysis for higher-order functional languages. 
Their analysis is less precise than Deutsch’s. It is, how- 


ever, conceptually simpler and more efficient. Their main 
contribution was to extend escape analysis to include lists. 
Deutsch [19] later presented an analysis that extracts the 
same information but runs in almost linear time. Blanchet [9] 
extended this algorithm to work in the presence of impera- 
tive features and polymorphism. He also provides a correct- 
ness proof and some experimental results. 

Baker [7] describes an novel approach to higher-order 
escape analysis of functional languages based on the type 
inference (unification) technique. The analysis provides es- 
cape information for lists only. Hannan also describes a 
type-based analysis in [26]. He uses annotated types to de- 
scribe the escape information. He only gives inference rules 
and no algorithm to compute annotated types. 


15.5 Synchronization Optimizations 


Diniz and Rinard [20, 21] describe several algorithms for per- 
forming synchronization optimizations in parallel programs. 
The basic idea is to drive down the locking overhead by coa- 
lescing multiple critical sections that acquire and release the 
same lock multiple times into a single critical section that 
acquires and releases the lock only once. When possible, the 
algorithm also coarsens the lock granularity by using locks 
in enclosing objects to synchronize operations on nested ob- 
jects. Plevyak and Chien describe similar algorithms for 
reducing synchronization overhead in sequential executions 
of concurrent object-oriented programs [34]. 

Several research groups have recently developed synchro- 
nization optimization techniques for Java programs. Aldrich, 
Chambers, Sirer, and Eggers describe several techniques for 
reducing synchronization overhead, including synchroniza- 
tion elimination for thread-private objects and several opti- 
mizations that eliminate synchronization from nested moni- 
tor calls [2]. Blanchet describes a pure escape analysis based 
on an abstraction of a type-based analysis [10]. The imple- 
mentation uses the results to eliminate synchronization for 
thread-private objects and to allocate captured objects on 
the stack. Bogda and Hoelzle describe a flow-insensitive es- 
cape analysis based on global set inclusion constraints [12]. 
The implementation uses the results to eliminate synchro- 
nization for thread-private objects. A limitation is that the 
analysis is not designed to find captured objects that are 
reachable via paths with more than two references. 

Choi, Gupta, Serrano, Sreedhar, and Midkiff present a 
compositional dataflow analysis for computing reachability 
information [15]. The analysis results are used for synchro- 
nization elimination and stack allocation of objects. Like 
the analysis presented in this paper, it uses an extension 
of points-to graphs with abstract nodes that may represent 
multiple objects. It does not distinguish between inside and 
outside edges, but does contain an optimization, deferred 
edges, that is designed to improve the efficiency of the anal- 
ysis. The approach classifies objects as globally escaping, 
escaping via an argument, and not escaping. Because the 
primary goal was to compute escape information, the anal- 
ysis collapses globally escaping subgraphs into a single node 
instead of maintaining the extracted points-to information. 
Our analysis retains this information, which is crucial for de- 
veloping a pointer analysis algorithm that takes interactions 
between threads into account. 


16 Conclusion 


This paper presents a new combined pointer and escape 
analysis algorithm for unstructured multithreaded programs. 


It extends the current state of the art in two ways: it is 
the first interprocedural, flow-sensitive pointer analysis al- 
gorithm for unstructured multithreaded programs, and it is 
the first algorithm to extract precise escape analysis infor- 
mation for objects accessible to multiple threads. We have 
implemented the algorithm in the IBM Jalapeno virtual ma- 
chine, and used the analysis results to perform a synchro- 
nization elimination optimization. Our experimental results 
show that, for our set of benchmark applications, the anal- 
ysis can successfully remove between 75% and 95% of the 
total synchronizations. 

In the long run, we believe the most important concept 
in this research may turn out to be designing analysis al- 
gorithms from the perspective of extracting and represent- 
ing interactions between analyzed and unanalyzed regions of 
the program. This approach leads to clean, compositional 
algorithms that are capable of analyzing arbitrary parts of 
complete or incomplete programs. 
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